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analysis.  These  results  will  be  proved  under  mild  conditions  on  the  under¬ 


lying  distribution. 


1.  INTRODUCTION 


In  the  area  of  model  selection,  various  procedures  have  been  proposed 
in  the  literature  and  their  properties  are  examined.  In  this  paper  we  con¬ 
sider  a  generalized  information  criterion  (6IC)  obtained  by  the  information 
theoretic  approach.  According  to  this  procedure,  we  find  the  model  which 
minimizes 

GIC  =  -2  log  L(e)  +  cNp 

where  L(e)  is  the  maximized  likelihood  and  p  is  the  number  of  parameters. 

Akaike  (1973)  proposed  to  take  c^  =  2,  and  Rissanen  (1978)  and  Schwartz 
(1978)  proposed  c^  =  log  N  where  N  denotes  the  sample  size  (see  also  Akaike 
(1978)  and  Hannan  and  Quinn  (1979)).  Recently  Zhao,  Krishnaiah  and  Bai  (1986) 
considered  the  GIC  such  that  (i)  lim  cn/N  =  0  and  (ii)  lim  cN/loglogN  =  +°°- 
The  above  criterion  is  sometimes  referred  to  as  efficient  detection  (ED)  cri¬ 
terion.  They  used  the  criterion  for  the  determination  of  the  number  of  signals 
under  a  signal  processing  model. 

In  the  present  paper,  we  propose  to  use  the  ED  criterion  for  certain 
problems  of  multivariate  analysis.  Sometimes  statistician  is  expected  to 
predict  the  explanatory  variables  using  some  of  the  response  variables  under 
the  multivariate  regression  model.  This  problem  is  treated  in  Section  2  by 
using  the  ED  criterion,  and  its  consistency  is  established.  Here  we  may 
note  that  Nishii  (1986)  pointed  out  the  inconsistency  of  Akaike's  AIC  in 
calibration.  In  Section  3  we  discuss  the  selection  of  variables  in  discriminant 
analysis.  Our  interest  is  to  find  the  variables  which  contribute  for  discrim¬ 
ination  between  the  populations.  Section  4  is  concerned  with  the  selection  of 
variables  in  canonical  correlation  analysis,  i.e.,  among  two  sets  of  variables 
we  want  to  find  which  subsets  are  important  for  studying  the  association  be¬ 
tween  two  sets.  The  investigations  for  the  above  cases  are  made  under  a  mild 
condition  on  the  underling  distribution. 


2.  MULTIVARIATE  CALIBRATION 

Let  q  explanatory  variables  x  =  (x^ ,  ...,  x  )'  and  p  response  variables 


y  =  (y,.  ...»  y_)*  have  the  linear  relation: 

-  I  P 


y  =  a  +  e'x  +  e 


(2.1) 


where  e  follows  N  [0,z],a:  pxl,  g:  qxp  and  Z:  pxp  are  parameters.  Suppose 


we  are  interested  in  estimating  x  by  using  observed  y.  If  all  parameters  are 
known,  the  maximum  likelihood  estimate  of  the  unknown  explanatory  variables 
x  is  obtained  by 


x  =  (Bl"V)"8Z~1(y-5)» 


(2.2) 


where  ( bs B ' )  is  a  G-inverse  of  ‘ .  However,  if  the  last  column  of 


,-l 


Si  is  zero  vector,  the  response  variable  y  would  supply  no  additional  in- 

r 


formation  on  x  in  the  multivariate  linear  model  (see  §4  of  Rao  (1973)). 

Hence,  we  want  to  obtain  the  best  subset  of  response  variables  such  that  each 
of  its  elements  has  some  information.  For  this  problem,  criteria  based  on 
information  theory  can  be  used.  For  a  review  of  the  literature  on  multivariate 
calibration,  the  reader  is  referred  to  Brown  (1982). 

Let  J  be  a  subset  of  indices  of  response  variables  {1,  ...»  p}.  We  say 


that  "the  assumed  model  is  J"  when  we  regard  that  y.  (j  e J)  provides  informa¬ 


tion  for  x  whereas  y.,  (j1  £ J)  does  not.  We  assume  the  existence  of  the 

J 

true  model  (1,  ...,  p^}  =  Jt  but  it  is  unknown  and  let  pt  <  p.  This  assump¬ 
tion  is  equivalent  to 


6tztt3t 

if 

J 

=?  J 

S 

if 

J 

2  J 

/ 

(2.3) 


_ ;ftasg!i^^ 


:  #J  *  #J  are 


and  tr  BjSjjBj  <  tr  if  J  £  Jt  where  Bj:  a  *  #J  and  Zjj 

submatrices  of  6:  qxp  and  z:  p  xp  corresponding  to  a  subset  J,  #J  denotes 
the  number  of  elements  of  J,  and  q  x  p^  and  z^:  x  are,  correspond¬ 

ing  to  Jt>  are  similarly-  defined  (see  McKay  (1977)  and  Fujikoshi  (1983)). 

When  all  parameters  are  unknown,  and  N  independent  observations  y^  at 
xi  (i  =  1,  ...,  N)  with  the  relationship  (2.1)  are  given,  we  use  the  esti¬ 
mates  of  a,  6  and  Nz  as 

a  =  y-B‘x,  B  =  Sxy  and  S  =  Syy-B'S^B  (2.4) 


where 


c 

XX  °XY' 


SYX SYY , 


(2.5) 


Note  that  S  and  B 1 SXXB  follow  the  Wishart  distribution  Wn[N-q-l,z]  and  the 


noncentral  Wishart  distribution  Wp[q,  z;  b’S^b]  respectively.  The  likeli¬ 


hood  ratio  for  the  model  J  against  the  full  model  =  {1,  ...,  p)  for  N 
calibration  samples  is  expressed  by  Fujikoshi  and  Nishii  (1986).  Hence, 


Gn(J)  =  GIC(J)  -  GIC(Jf)  =  A(J)  -  q(p-#J)cN 


where 


IS,  MS  +  B-S  B| 

=  N  109  |S| lSJJ  +  BjSXX6jl 


(2.6) 


(2.7) 


We  select  the  model  such  that 

Gn(Jn)  =  min  GN(J). 
J 


(2.8) 


Recall  the  criterion  function  (2.6)  is  derived  when  y.  are  normally  dis- 


tributed.  However,  we  apply  this  procedure  when  we  relax  the  assumption  of 
normality.  Nishii  (1986)  studied  the  asymptotic  behavior  of  the  AIC  for  the 
case  cN  e  2  in  (2.6)  under  a  weak  assumption  and  he  showed  that  the  AIC  is 
not  consistent  in  multivariate  calibration  problem.  If  we  use  the  ED  cri¬ 
terion,  is  chosen  such  that 

(i)  lim  (cn/N)  =  0,  (ii)  lim  (cw/log  log  N)  =  ». 

N-x>°  ’  N-*00  ’ 

We  will  show  that  the  MDL  criterion  is  strongly  consistent  under  the  follow¬ 
ing  mild  conditions: 

ASSUMPTION  1.  The  error  vectors  e.  of  y^  (i  =  1,  ...,  N,  ...)  are 
independently  and  identically  distributed  (i  i.d)  with 

Ee^  =  0,  £e-|e|  =  z  and  E(eje-|)y^2  <  °°  (2.9) 


iterated  logarithm,  we 


where 
by  (2.5) 

gn(j) 


N-1B'SXXB  =  I 


N_1S  =  I 


:  q  x  p  is  def i 
,  (2.13)  and  ( 

=  tr { ( 3E- 1 6 '  -  I 


On  the  other  hand  e  0  for  any  N  by  the  definition  of  G^.  This 


yields  that  MDL  criterion  asymptotically  prefers  to  J  if  J  ^  J^..  When 


=  Jt»  the  proof  follows.  If  t  at  first  we  consider  the  case 

S..  S 


J  =  J, 


f?Jf  Depote  S  P*”'  Stt:ptxpf  8'[Bt.B,]^«P. 

:  q  x  p  Let  # t  =  -  S-j and  define  (S  +  B'S^B)^  ^  in  a 

similar  way.  Put  U  *  S^2B  =  [Ut,U-|]:qxp  and  :  QxPt-  From  Fujikoshi 
(1983),  we  know  that 

(S  +  B ' SXXB>  1 1  -t  "  S11  -t  =  (s  +  LJ,lJ)n.t  '  Sll -t 


=  <U1  "UtSnStl',(Iq  +  UtSttUP"1*Ul  '  UtSttStl* 


By  the  law  of  iterated  logarithm  and  Lemma  2.1,  we  have 


N  1 S11  - 1  =  E11  *t  +  ’  a-S* ’ 


UtSttUt  N  SXX  6tEttBtSXX  +  a's” 


=  0(1 ) ,  a.s. 


U1  '  utsttsti  SXX  S1  '  sxx  stzttLti  +  °^N  1  V’  a'S-’ 
=  0(N  1/2aJ,  a.s. 


-1. 


The  last  equality  follows  from  the  relation  B-j  =  which  is  obtained 

by  (2.3).  Hence 


GN(Jt)  =  A(Jf ,  Jt)  -  q(p  -  Pt)cN 
| (s  +  U'U),,  t I 

■ N1pg  'tsvt.? - p(p-pt)cN 

*  N  Tog  1 1  *  Sll.tl^S  +  UU^ll-t“STT-t1'  '  q*p  *  pt'CN 


=  0(log  log  N)  -  q(p  -  pt)cN 


(N  ■>«>) ,  a.s. 


(2.17) 
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because  p  -  p  >  0  and  lim  cw/1oglogN  =  +».  This  implies  that  the  ED 
1  N-*»  N 

criterion  will  not  asymptotically  select  the  model  J^.  When  J  follow¬ 
ing  similar  lines  as  in  the  above,  it  holds  that 

A(Jf ,  J)  =  0(log  log  N) ,  a.s. 


Hence, 

gn(V  '  gn(j)  =  A(Jf"V  ‘  A(Jf’J)  ■  q(p-#J)cN 


=  0(log  log  N)  -  q(p  -  #J)c^  -*■  -<*>,  a.s . 

This  completes  the  proof. 


However,  we  must  calculate  2^-1  G^(*)‘s  to  obtain  of  (2.8).  When 
p  is  large,  this  would  involve  extensive  computation.  To  overcome  this 
problem,  we  propose  an  alternate  procedure,  which  is  also  based  on  the  MDL 
criterion.  Let  0_.  =  {1,  ...,  i-1,  i+1,  ...»  p)  Tor  i  =  1,  ...,  p.  Define 

JN  =  {i  e  Jf|GN(J_.)  >  0  =  GN(Jf)l.  (2.18) 


This  subset  is  obtained  by  calculating  only  p+1  G^(*)'s,  but  this  is  still 
a  strongly  consistent  estimate  of  J^.  (See  Zhao,  Krishnaiah  and  Bai  (1986).) 

THEOREM  2.2.  Under  Assumptions  1  and  2,  we  have 

lim  JN  =  Jt,  a.s. 

N-KD 

Proof.  If  i  e  Jt,  then  By  (2.15),  GN(J_i)  tends  almost 

surely  to  infinity.  Hence  0^  3  i  for  large  N,  a.s.  If  i  i  J^.,  then 

J  i  =  Jt-  By  similar  discussion  as  (2.17),  we  have 

G.i ( J  .)-►-«>  as  N  -*■  °°,  a.s. 

N  -l 

This  implies  i  ft  JN  for  large  N,  a.s.,  and  this  completes  the  proof. 
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3.  DISCRIMINANT  ANALYSIS 


The  discussion  on  multivariate  calibration  can  be  applied  to  the 
variable  selection  in  multiple  discriminant  analysis.  Consider  q  +  1 
p-variate  normal  populations  with  mean  vector  ^  and  common  covariance 
matrix  E  (a  =  1,  ....  q  +  1).  Assume  N  samples  x  ,  . ..,  x  ..  are  drawn 

a  -a  I  ~aN 

a 

from  n  .  We  are  interested  in  interpreting  the  differences  among  the 

q  +  1  populations  in  terms  of  only  a  few  canonical  discriminant  variates. 

Let  ft  be  the  population  between-groups  covariance  matrix  as 

iq+1  _  _ 

ft=N  [  N  (y  -  n)(p  -y)':  pxp, 

i  a  -a  -  ~a 
a=  I 

where  p  =  and  N  =  ^N^.  Let  J  be  a  subset  of  (1,  ....  p}  =  Jf.  We 

say  that  the  model  is  J  when  unknown  parameters  satisfy 


trE_1ft  =  trZjjftjj  >  tr  Ej]  j  ,ftj ,  j ,  for  j’ 


"  J '  J 1  1 J 1 


(3.1) 


where  ftjj  and  Ejj  are  IJx#J  submatrices  of  ft  and  E  respectively.  We  assume 
that  the  true  model  exists  and  denote  it  by  Jt  =  (1,  ...,  Pt>.  The  maximum 
likelihood  function  under  the  model  J  is  known  (see  Fujikoshi  (1983)).  Hence, 
we  have 


Gn(J)  =  GIC(J)  -  G I C ( J f ) 


|W  .||w  +  u| 

=  N1og  Iwllvfjj'+Ujjr  “  q(p'*J)cr 


(3.2) 


where 


N 

q+i  a 

W  =  l  y  (z  .  -  z  ) ( 2  .  -  z  )': 
L ,  .  -oil  ~a  ~cl  ~a 

a= 1  1=1  ~ 


(3.3) 


9 


U  =  y  N(z  -  z)  (z  -  z) ' :  p  x  p 

a  -a  -  ~a 

a-  I 


(3.4) 


-  -1  ra  -  -1  q+  - 

z  =N  )  z  z=N  y  N  7  .  Here  W  and  U* respectively  denote  the 
-a  a  -  L1  a~a  J 

1=1  a= I 


within  group  sums  of  squares  and  cross  products  (SP)  matrices.  Note  that 


W  ~  W  [N-q-1 ,  z]  and  U  ~  W  [q,i;Nn],  and  recall  that  S  ~  W  [N-q-l,z]  and 
P  P  P 


B’SXXB  ~  Wp[q,  E;  6‘Sxxe]  in  (2.5).  Let  (Sxx  =  sj^}  be  a  sequence  satisfy¬ 


ing  Assumption  2  with  y  =  2.  Then  we  can  find  g  =  bm;  qxp  such  that 


B'SXX6  =  Ntt  since  rank  n  <_  p,  q.  PutS  =  W  and  B '  Sxx  =  U  in  (2.5).  This 


gives  the  correspondence  between  (2.5)  and  (3.2)  except  that  $  depends  on  N. 


A  — 

Let  be  a  subset  of  minimizing  (3.2)  and  let  be  a  subset  of 


defined  by  (2.18)  in  this  situation. 


THEOREM  3.1.  Let  z  .  -  u  (i  =  1,  ...»  N  ;  a  =  1,  ...»  q+1 )  be  i.i.d 

~ol  ‘-a  a 


with  E(z  .  -  u  )  =  0  and  E(z  .  -  y  ) (z  .  -  y  ) '  =  I.  Assume  that  the  data 
~al  ca  ~  ~al  -a  ~al  ~a 

increases  satisfying  the  condition 

0  <  m'  <  N_1N  <1  (a  =  1 ,  ... ,  q+1),  N  =  £ N 


where  m'  is  a  positive  constant.  Then  both  and  are  strongly  con¬ 


sistent  estimators  of  J^.. 


□ 


**.*s*s\ 
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4.  CANONICAL  CORRELATION  ANALYSIS 


In  this  section  we  treat  the  variable  selection  problem  in  canonical 


correlation  analysis.  Let  z  =  ( x ' ,  y ' ) '  follow  Np+q[u,s]  where  x:  qxl, 


y:  pxl,  u  =  (y'.u')':  (p  +  q)xl,  u  :  qxl,  E  =  (Ixx  JxyY  (p  +  q)  x  (p  +  q) 


-x-y 


V  z 

lYX  ^YY. 


and  Ly:  qxq.  Suppose  we  are  interested  in  summarizing  the  relationship 


between  x  and  y  by  using  a  small  number  of  variables.  Let  I^.  =  (1,  . ..,  q} 


and  J,  =  {1 ,  ...,  p)  be  sets  of  the  indices  of  x  and  y  respectively.  Con¬ 
sider  subsets  I  c  If  and  J  c  J^.  We  say  that  the  model  is  (I,J)  when,  using 
submatrix  Ejj  of  E^y  and  so  on,  we  assume  that 


tr EYXZXX2XYZYY  ‘  tr  ZJIZI Irl JE JJ* 


(4.1) 


Further  we  suppose  the  existence  of  the  true  model  (It»Jt)  which  consists 


of  the  smallest  number  of  parameters  satisfying  (4.1)  when  Ip  =  {1,  ...»  qp} 
and  =  (1,  ...,  p^ } .  Also,  let  (xJ ,y?)  be  N  independent  observations  of 


z'  and  put 


S  S  \ 
XX  XY  - 


s  s 

iyX  ^YY 


n  /2i -5\/2i -i 


=  i=iU 


(p  +  q)  x  (p+q). 


Consider  the  model  (I,J)  where  I  =  (1,  ...,  and  J  =  (1,  ...,  p^.  Corres¬ 


ponding  to  I  and  J,  we  partition  S  into  16  submatrices  (S.j);  i,  j  =  1,  ...,  4 


.  _  /Sll  S12  \ 

>xx  -(s21  s22  V  qxq’ 


/S13  Su\  Q  _  -  /S33  S34  \ 

XY  "  S23  S24  / '  YY  “(S43  S44  -PXP> 


^11  ■  9 1  ^  9  »  ^13  *  ^  P  *  S33  •  p  1  x  p  -j  and  S .  j  -  Sj^.  Then  the  like¬ 


lihood  ratio  test  statistic  of  the  model  ( I , J)  and  the  full  model  is  given 


by  Fujikoshi  (1982)  as 


+  h  *  »  •  *  *  ►  *  >  •  •  "  ».  *  h  “  .  ®  •  *  »  "  « 


11 


;I  ,J)  =  -2  log  x  =Nlog{ 


22.1 


'44.3 


522.13  S22. 1 3 

542.13  S44.13 


}» 


(4.2) 


where 


Si j .13  =  Si j.l  '  Si3.1S33.1S3j.l  =  Si j .3  '  Si 1 -3S1 1 . 3S1 j • 3’ 

s  =  s  -  s  s-1s 

I'j.k  ij  ik  kk  kj* 

If  I  3  It  and  J  d  Jt  or  >_  and  >_  pt,  then  (4.1)  is  true 

which  yields  (z41  3,  s42  3)  =  0  and  (£23  i»  £24.1^  =  Hence>  by  the  law 

-1  1/2 

of  iterated  logarithm,  using  5.^  =  (N  log  log  N)  , 


N  ls22 J  “  E22.1  +  0<V’ 


N  S44-3  =  E44.3  +  °^N^*  a,S*’ 


S22 .1 3  S24 .1 3 
S42  .13  S44 .1 3 


E 22  - 1 3  E 24 . 1 3 ^ 
Z42 -13  E44 -13 


+  0(4n)  = 


+  0(4^) > 


a.s. , 


A(lf.jf ;I* J)  =  N  log! 


'22.1 


'44.3 


E  22 . 1  0 


'44.3 


+  O(fcjj)}, 


cl  •  S  •  * 


this  case  let  I£  =  I  u  and  Jj  =  J  U  When  we  restrict  the  variables 
of  x  and  y  as  x . ( i  e  I|)  and  y^(j  e  J£),  the  true  model  remains  (I^.J^.). 


I 


:5a 


r..»N 


>1 


v\ 

v'S 


0(log  log  N) ,  a.s.,  if  I  ?  lt  and  J  =  J  . 

(4.3) 

• 

v  « 

If  q1  <  qt  or  p] 

<  p^  (which  implies  I  ^  1^  or  J  ^  J^),  then  (z^ 

.  1  ,E24. 1 )  *  0 

VS 

■•S'4' 

or  ^41  -  3 *E42 -  3 ^ 

^  0.  Hence,  1 s 22 - 1  ^  E44 - 3 ^  >  lE22.13^  E44 *13!  * 

Therefore, 

• 

$ 

A( I^»J^»I >d) 

1  £22  1 ^  S44  3  ^ 

>  N  log  tt - — rrr — - — T  +  0  (log  log  N) ,  a.s. 

1 E22. 1 3' 1 E44. 1 31 

v.v 

•*vv 

to 

■  •• 

-*■+”,  (N  ■>  °°) ,  a.s. 

(4.4) 

w: 

This  discussion 

is  applicable  in  the  general  case  of  I_£  If  or  J_ 

£Jf  In 

sS 

APPENDIX 


Proof  of  Lemma  2.1.  We  prove  that  the  (k,i)-th  element  of  -  x^)e.! 

is  0(  /N  log  log  N) ,  a.s . ,  0  £  k  £  q »  1  <  I  <  p).  Hence,  we  do  not  lose  gener 

2 

ality  by  assuming  q  =  1  and  Ee^  =  1.  We  prove 
n  _ 

l  (x.  -  x  )e.  =  0(/n  log  log  n) ,  a.s.  (A.l) 

i=l  1  n  1 


To  prove  (A.l),  we  need  to  show 

I  PC  U  <  n 

k 


uu  I  | 

I  p[  U  (I  (x. -X  )e.  >  K/n  log  log  n}]  < 
=1  „k-l  _  0k  i=l  1  n  1 


2  <n<2 

k-1  k 

for  some  positive  constant  K>0.  If  2  <n<2, 


! x  -x  .|  =  |n-1  l  (x.  -  x  )  <  {n-1  l  (x,  -x  J2}1/2  <  M. 
n  2k  i=1  i  2k  i=i  i  2k 

Hence  by  the  law  of  iterated  logarithm. 


(—  )  l  e-  =  0(/n  log  log  n) ,  a.s. 

Xn "  x2k  1=1  1 


Thus  we  shall  prove 


[  P[Ek]  < 
k=l  K 


(A. 2) 


where  Ek  =  k_^  k(  [  (x .  -  x  .)e.  >  K2K/^'log  k}. 

2  <rn<2  i  =  l  2 


n 


jk/2 


Define 


eik 


^  ineili2k/2, 

0  otherwise. 
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where 


P(Ek)  <  P(Ep  +  PC  l_J  (e. /e!k)] 


k-iu 

2k  <n<2k  1-1  2 


J,Pt1“,(e'i‘eikn  *  j^tVey  -  j^Ck,!  >  2k/2] 

=  l  2k  l  P[2i/2  <_  |ei|  <  2(&+1^/2]  =  l  p[2l/2  <  |e.|  <  2(£+1)/2] 
k=l  £-k  £=1  k 

<  y  2£+1  p[2£//2  <  |e  I  <  2^£+1^2l  O  r  Ee2 1 

-A  -Iel'<2  ]<-2j,Ee'V2<|ei|  <  2C«D/2; 

i  2Ee2  =  2, 

'Eeikl  ■  lE<eik-el)l  ■  E|*,ll  >  zyn  <_  2-k/2EeE  -  2-k'2, 

jl, 1  xi -72kHEeikl  I  tn1l1<xi  -  *zk>Z)1/22'k/2  1  2k/2/2M 

for  large  n.  If  we  let  ej|(  -  ejk  -  Eel,  and  T,  -  J  (x  -7  ),  „e  obtain 

i  =  l  '  2 

P<Ek)  "  P[?k-,U  ,k‘T"-  K2k/2/1^"k  -  j^i-x  kll^lkl)] 


ipC  ,  U  (Tn2  K2k/2/i7Tk  -  2k/2.'JR)] 


9k-l  _k 
2  <n<2 


1P[  ,  ,U  ,{TnI  K'2k/2/T5Fk)]  =  P[FJ, 


2k-1<n<2k 


where  we  can  take  a  new  constant  «’  >  0  if  K  >  0  is  sufficiently  large. 


WlWWPWWW 


II 
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Therefore, 

2k  i,o 

P(E')  <  P[  I  (x.  -x  k)|e'J  >  k ' 2 2 /Tog  :  ] 
k  i=1  i  2k  2k 

<  2{  1  -  <t> ( K *  /Tog~k) >  +  CQRk 
2k 

where  R^  =  £  |  x .  -  x  k|  3E | e, .  |  2/{ 22l<//2(  1  +  */TogT)3} ,  where  <t>(x)  is  the 
inl  2 

standard  normal  distribution  function  and  Cg  is  a  constant  independent  of 
n.  The  last  inequality  is  due  to  Bikelis  (1966).  If  K '  >  /2,  then  we  know 
that 

oo 

l  {1  -  <t>(K'  /TogT))  <  «. 
k=l 

If  Y  =  3, 

oo  oo 

l  R,  <  C,  l  (k  log  k}"1  <  «. 
k=l  K  ~  1  k=2 

If  2  <  Y  <  3, 


I  K  <  r  l  2lJ'YjK/^|e 


(3-Y)k/2  |  ,3 

t|elk' 


<  C?  l  2'(3“y)k/2(  l  E|e, |3I  .  .  +  1) 

V=1  t-1  1  [2(£-l)/2  ^  ,  j  <  2*72-, 


<  Co  I  2-(3-Y)k/2{Eie  i  3i 


1  j-2(£-1  )/2  £  |e^|  <  2^/2] 


1  c30iiE,ei  ,Ylr,(n-n/2 


+  c„  1  CoE|e,r  +  C.  < 


~  Vh  '  ^  ,ei|  <  22-/2]  4  ~  3  1  ^ 

because  E  |  e-j  |  r  <  °°,  where  C-| ,  are  positive  constants.  Thus 

comnlete  the  proof  of  (A. 2). 


REFERENCES 


[  1]  AKAIKE,  H.  (1973) „  Information  theory  and  an  extension  of  the  maximum 
likelihood  principle.  2nd  International  Symposium  on  Information 
Theory  (B.N.  Petrov  and  F.  Czaki ,  Eds.).  Akad&miai  Kiado,  Budapest, 
267-281. 

[  2]  AKAIKE,  H.  (1978).  A  Bayesian  analysis  of  the  minimum  AIC  procedure. 

Ann.  Inst.  Statist.  Math.,  30,  9-14. 

[  3]  BIKELIS,  A.  (1966).  Estimates  of  the  remainder  term  in  the  central  limit 
theorem.  Litovskii  matem. ,  sb.  6.3,  323-346. 

[  4]  FUJIKOSHI,  Y.  (1982).  A  test  for  additional  information  in  canonical 
correlation  analysis.  Ann.  Inst.  Statist.  Math.,  34,  523-530. 

[  5]  FUJIKOSHI,  Y.  (1983).  A  criterion  for  variable  selection  in  multiple 
discriminant  analysis.  Hiroshima  Math.  J.,  13,  203-214. 

[  6]  FUJIKOSHI,  Y.  and  NISHII,  R.  (1986).  Selection  of  variables  in  a 
multivariate  inverse  regression  problem.  Hiroshima  Math.  J.,  16, 

269-277. 

[  7]  HANNAN,  E.J.  and  QUINN,  B.G.  (1979).  The  determination  of  the  order 
of  an  autoregression.  J.  Roy.  Statist.  Soc.y  B>  41,  190-195. 

[  8]  McKAY,  R.J.  (1977).  Simultaneous  procedures  for  variable  selection  in 
multiple  discriminant  analysis,  Biometrika,  64,  283-290. 

[  9]  NISHII,  R.  (1986).  Criteria  for  selection  of  response  variables  and 
the  asymptotic  properties  in  a  multivariate  calibration,  Ann.  Inst. 
Statist.  Math.,  38,  319-329. 

[10]  RAO,  C.R.  (1973).  Linear  Statistical  Inference  and  its  Applications . 

John  Wiley,  New  York. 

[11]  RISSANEN,  J.  (1978).  Modeling  by  shortest  data  description.  Automat i ca, 
14,  465-471. 

[12]  SCHWARTZ,  G.  (1978).  Estimating  the  dimension  of  a  model.  Ann.  Statist., 
6,  461-464. 

[13]  ZHAO,  L.C.,  KRISHNAIAH,  P.R.  and  BAI ,  Z.D.  (1986).  On  detection  of  the 
number  of  signals  in  presence  of  white  noise.  <7.  Multivariate  Anal., 

20,  1-25. 


